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ABSTRACT.  From  k  independent  populations  P,,...,Pk ,  which  belong  to  a  one  parameter 
exponential  family  (Fe}  ,  0sHc  91,  random  samples  of  sizes  m,,...,mk,  respectively,  are  to 

be  drawn.  After  the  observations  have  been  drawn,  a  selection  procedure  will  be  used  to 
determine  which  of  these  k  populations  has  the  largest  value  of  0  .  Given  a  loss  for  selections 
at  each  parameter  configuration,  given  n  past  observations,  and  given  a  prior  for  the  k 
parameters,  a  Bayes  selection  procedure  can  be  found  and  its  Bayes  risk  can  be  determined, 
where  both  depend  on  m„...,mk.  Let  the  sample  sizes  be  restricted  by  m1+...  +  mk=m, 
where  m  is  fixed.  The  problem  of  how  to  find  the  optimum  (minimum  Bayes  risk)  sample 
design  subject  to  this  constraint  is  considered,  as  well  as  m-truncated  sequential  sampling 
allocations.  Results  for  normal  and  binomial  families,  under  the  “0-1  ”  loss  and  the  linear  loss, 
are  presented  and  discussed.  An  introduction  to  Bayes  selection  procedures  is  included. 
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1.  Introduction 

Let  P,,...,Pk  be  k  populations  which  belong  to  a  given  one  parameter  exponential  family 
{FQ}  ,0  eficSR,  where  P,  is  associated  with  a  certain  parameter  0;  efi,  i  =  l,...,k,  but  the 
values  of  0 0k  are  unknown.  Suppose  one  wants  to  select,  based  on  independent  random 
samples  of  respective  sizes  m,,...,mk,  that  population  which  has  the  largest  0  -value  0[k] ,  say. 
In  the  decision  theoretic  approach,  let  L(0,i)  be  a  given  loss  for  selecting  population  P  at  any 
0  =  (01,...,0k)  sQk,  i  =  l,...,k.  Two  special  types  of  loss  functions,  which  will  be  primarily 
considered  later  on,  are  the  so-called  “0-1”  loss  L(0,i)  =  0  if  _0M’  and  1  otherwise,  and 
the  linear  loss  L(0,i)  =  0[k]-0j,  i  =  l,...,k.  The  performance  of  a  selection  rule  d,  i.e.  a 
measurable  function  from  the  sampling  space  into  the  set  {l,...,k} ,  can  then  be  measured  by  its 

expected  loss,  i.e.  its  frequentist  risk,  at  each  parameter  configuration  0  eQk.  Extending  this 
framework  to  the  Bayes  approach,  it  is  assumed  that  the  parameters  0  =  (0j,...,0k),  say,  are 

a  priori  random  and  follow  a  known  prior  density  7t(0) ,  0  eQk .  The  purpose  of  this  paper  is 
to  provide  an  introduction  to  Bayes  selection  procedures,  a  brief  review  of  multi-stage 
selection  procedures,  and  a  thorough  discussion  of  recent  results  on  Bayes  look-ahead 
sampling  designs  for  selection  procedures. 

The  history  of  selection  procedures  dates  back  to  the  1950s.  The  first  such  procedures 
considered  have  been  based  on  k  independent  samples  of  equal  sample  sizes.  An  overview  and 
thorough  discussions  of  these  early  procedures,  and  of  the  development  of  numerous  branches 
of  the  theory  of  selection  thereafter  until  1979,  is  provided  in  the  by  now  classical  monograph 
by  Gupta  and  Panchapakesan  (1979).  In  celebration  of  “40  Years  of  Statistical  Selection 
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Theory”,  and  its  pioneers  Robert  E.  Bechhofer,  Shanti  S.  Gupta,  and  Milton  Sobel,  a 
conference  has  been  held  on  September  5-10,  1993,  at  Bad  Doberan  in  Germany,  and  its 
proceedings  have  been  included  in  a  special  journal  issue,  edited  by  Miescke  and  Rasch  (1996). 

To  reduce  the  effort  or  cost  of  sampling,  without  losing  too  much  power  in  the  decisions, 
selection  procedures  which  incorporate  combinations  of  various  types  of  sampling,  stopping, 
and  selection  components  have  been  proposed  and  studied  in  the  literature  over  the  past  three 
decades.  In  their  fundamental  monograph,  Bechhofer,  Kiefer,  and  Sobel  (1968)  have  derived 
for  exponential  families,  in  the  frequentist  approach,  optimum  sequential  selection  rules.  These 
are  based  on  vector-at-a-time  sampling,  i.e.  sampling  of  the  same  number  of  observations  from 
each  population  at  a  time  or  stage,  the  natural  terminal  selection  decision,  and  an  optimum 
stopping  rule.  Elimination  of  certain  populations  from  further  sampling,  which  may  emerge  as 
apparently  inferior  populations  during  the  sampling  process,  is  not  allowed  there  as  an  option 
at  intermediate  stages  of  the  sampling  process.  At  this  point  it  should  be  mentioned  that  in  case 
that  an  elimination  of  populations  from  further  sampling  would  indeed  be  allowed,  the  option 
exists  of  extending  this  elimination  to  the  pool  of  populations  available  for  selection  at  the  end, 
or  not.  An  overview  of  sequential  ranking  and  selection  procedures  is  provided  by  Gupta  and 
Panchapakesan  (1991). 

A  simple  selection  procedure,  based  on  vector-at-a-time  sampling,  which  incorporates 
elimination  from  sampling  and  selection,  is  a  two-stage  selection  procedure  of  the  following 
type.  After  t,  observations  have  been  drawn  from  each  population  at  Stage  1,  a  suitable  set  of 
populations  is  eliminated  from  further  sampling.  Then  t2  observations  are  drawn  from  each 
population  that  has  not  been  screened  out,  and  a  final  selection  is  then  made  from  the  latter, 
using  all  of  the  data  observed  from  them.  Here  an  option  exists  regarding  the  number  of 
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populations  retained  for  Stage  2:  it  can  be  chosen  to  be  random  or  fixed  pre-determined.  The 
former  case  has  been  studied  in  Gupta  and  Miescke  (1984a)  in  the  Bayes  approach.  Both 
cases,  and  their  extensions  to  multi-stage  selection  procedures  have  been  treated,  also  in  the 
Bayes  approach,  by  Gupta  and  Miescke  (1984b),  using  backward  optimization.  An  overview 
of  this  class  of  procedures,  which  select  the  best  population  efficiently  in  terms  of  risk  and 
sampling  costs,  is  provided  by  Miescke  (1984).  The  results  derived  in  this  respect  depend 
heavily  on  the  assumption  that  at  each  sampling  stage  an  equal  number  of  observations,  which 
may,  however,  vary  from  stage  to  stage,  is  drawn  from  each  population  still  in  the  running. 
This  assumption  allows  to  utilize  permutation  symmetry  in  the  posterior  risks  in  connection 
with  permutation  invariant  priors,  which  simplifies  the  analysis  of  such  procedures. 

Whenever  a  population  is  eliminated  from  the  pool  of  populations  retained  for  a  final 
selection,  a  conflict  of  the  following  type  may  arise.  The  data  collected  from  such  a  population 
prior  to  its  elimination  could  make  it  look  better  again,  relatively  to  the  other  populations,  after 
further  sampling  from  the  latter  have  turned  out  to  be  not  so  favorable  for  those.  The  Bayes 
approach  clearly  calls  for  utilizing  the  information  from  all  observations  that  have  been  drawn, 
since  more  observations  cannot  decrease  the  Bayes  risk.  Using  this  approach,  it  may  in  fact 
occur  that  such  an  eliminated  population  emerges,  in  terms  of  the  posterior  risk,  as  the 
population  that  appears  to  be  the  best.  Elimination  of  populations  from  final  selections  is  thus 
unreasonable  from  a  Bayesian  point  of  view.  Elimination,  or  just  temporary  elimination,  of 
populations  from  sampling  at  some  stages  of  the  sampling  process  can  be  incorporated  in  a 
natural  way  into  the  Bayes  approach.  The  advantage  of  inherent  permutation  symmetry, 
however,  is  lost  here.  In  conclusion,  allocation  of  a  possibly  unequal  number  of  observations, 
where  some  of  them  may  be  actually  zero,  to  the  k  populations  at  various  stages  seems  to  be 
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more  appropriate.  Temporarily  drawing  no  new  observation  from  certain  populations,  but 
retaining  all  k  populations  in  the  pool  for  the  final  selection  decision,  may  be  called  a  soft 
elimination.  Such  a  soft  elimination  invites  to  use  also  priors  which  are  not  permutation 
symmetric,  since  updated  priors  of  this  type  occur  anyway  in  a  natural  manner  at  the  various 
stages,  due  to  soft  elimination. 

In  many  statistical  experiments,  the  sampling  process  extends  over  a  substantial  length  of 
time.  One  of  the  advantages  of  the  Bayesian  approach  in  a  statistical  analysis  is  that  it  allows  to 
perform  conclusions  at  intermediate  time  points.  Especially,  such  conclusions  can  be  made 
toward  modifications  of  the  future  sampling  and  decision  process.  Such  types  of  adaptive 
sampling  or  sampling  allocation  schemes  will  be  the  main  topic  of  discussions  later  on,  when 
Bayes  look-ahead  selection  procedures  are  considered.  Dealing  with  information  from  samples 
of  possibly  unequal  sizes  from  the  k  populations  may  occur  quite  naturally.  A  test  person  may 
not  always  come  on  schedule,  or  drop  out  of  the  study,  a  test  object  may  break  under  stress,  a 
budget  cut  may  force  to  abandon  some  test  runs,  or  random  time  spans  are  under  study,  which 
are  subject  to  some  form  of  censoring. 

The  main  type  of  problem  considered  in  this  paper  is  how  to  allocate  observations  to  the  k 
populations  in  a  stepwise  manner,  where  the  goal  is  to  select  the  best  population  at  the  end  of 
the  sampling  process.  More  precisely,  assume  that  k  independent  samples  of  respective  sizes 
n p..., nk  have  been  observed  already  at  a  first  stage  from  populations  Pj,...,Pk,  which  may  be 

the  combined  outcomes  of  several  previous  stages,  and  that  m  additional  observations  are 
allowed  to  be  taken  at  a  future  second  stage.  One  interesting  problem  that  is  considered  later 
on  is  how  to  allocate  m](...,mk  observations,  subject  tom,+...+mk  =m,  in  an  optimum  way 
among  the  k  populations,  given  all  the  information,  prior  and  first  stage  observations,  gathered 
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so  far.  Looking  ahead  with  the  expected  posterior  Bayes  risk,  given  the  information  presently 
at  hand,  and  then  minimizing  it,  does  not  only  provide  an  optimum  allocation  of  observations  in 
the  future.  It  also  allows  to  assess  how  much  better  the  final  decision  can  be  expected  to  be 
after  further  sampling  has  been  done,  following  this  optimum  allocation.  In  marketing  research 
such  as  direct  marketing,  medical  research  such  as  clinical  trials  (Whitehead,  1991),  and  social 
research  such  as  survey  sampling  (Govindarajulu,  Katehakis,  1991),  very  often  interim  analyses 
are  performed  at  certain  stages  to  decide  if  sampling  should  be  continued,  and  if  so,  how  to 
allocate  new  observations.  Such  Bayes  designs  have  been  studied  in  the  binomial  case,  under 
various  loss  functions,  by  Gupta  and  Miescke  (1993)  for  the  more  general  problem  of 
simultaneous  selection  and  estimation,  including  cost  of  sampling.  For  the  sake  of  simplicity  of 
presentation,  simultaneous  estimation  with  selection  and  cost  of  sampling  will  only  be 
considered  briefly  in  the  following.  The  former  would  require  to  use  more  involved  loss 
functions,  and  the  latter  the  incorporation  of  stopping  rules.  Modifications  of  the  allocations 
considered  later  on  to  such  extended  features  are  straightforward,  but  technically  more 
involved. 

Allocating  m  new  observations  at  a  second  stage,  using  the  expected  posterior  risk,  can 
only  be  done  after  the  terminal  selection  rule  is  known.  The  latter  is  the  Bayes  selection  which 
is  based  on  all  observations  drawn  in  the  complete  sampling  process.  Thus,  the  first  step 
toward  a  Bayes  design  for  the  second  stage  is  to  determine  the  optimum  (Bayes)  single-stage 
selection  rules  for  various  sample  sizes  m1,...,mk.  This  has  been  done,  under  both  the  “0-1” 
loss  and  under  the  linear  loss,  for  the  binomial  case  in  Abughalous  and  Miescke  (1989), 
including  extensions  to  a  larger  class  of  loss  functions,  and  for  the  normal  case  in  Gupta  and 
Miescke  (1988). 
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After  the  Bayes  terminal  selection  decisions  are  known,  one  can  proceed  as  described 
above,  looking  ahead  for  all  possible  sampling  allocations,  which  are  in  the  present  setting 
restricted  by  m1+...+mk=m,  compare  the  associated  expected  posterior  risks,  find  its 
minimum  value,  and  then  implement  a  design  associated  with  it.  At  this  point  one  may  wonder 
why  all  m  observations  are  allocated  at  once,  rather  than  allocating  only  a  few  (or  just  one), 
learning  more  through  them  (it),  and  then  initiating  a  new  allocation  optimization  process  for 
the  remaining  allocations.  As  will  be  shown  later,  such  a  breakdown  of  allocation  of  m 
observation,  if  done  properly,  cannot  increase  the  Bayes  risk  and  may  in  fact  be  better  than 
allocating  all  m  observations  at  once.  The  best  possible  allocation  scheme  is  to  allocate  one 
observation  at  a  time,  in  m  consecutive  steps,  which  are  altogether  determined  by  backward 
optimization,  starting  at  the  end  with  the  Bayes  terminal  selection  for  every  possible  allocation 
m  ...  m,  with  m,+...+rm  =m,  and  then  optimizing  successively  every  single  allocation 

before.  However,  this  appears  to  be  only  feasible  for  discrete  distributions,  and  the  only  Bayes 
sequential  design  of  this  type  that  has  been  treated  up  to  now  is  for  the  binomial  case  (Miescke, 
Park,  1997a).  Alternative  allocation  schemes,  which  appear  to  perform  close  to  the  best  based 
on  backward  optimization,  are  studied  and  discussed  for  the  normal  case  in  Gupta  and  Miescke 
(1994,  1996a),  and  for  the  binomial  case  in  Gupta  and  Miescke  (1996b)  and  Miescke  and  Park 
(1997a).  Another  type  of  adaptive  sampling  and  selection  for  Bernoulli  populations,  which  is  in 
the  frequentist  approach,  can  be  found  in  Bechhofer  and  Kulkami  (1982). 

One  reasonable  procedure  is  to  allocate  in  an  optimum  way  one  observation  at  a  time, 
pretending  that  it  is  the  last  one  to  be  drawn  before  final  selection,  and  then  to  iterate  this 
process  until  all  m  observations  have  been  taken.  This  will  be  considered  later  on  in  this  paper. 
Other  procedures,  which  may  allocate  more  than  one  observation  at  a  time,  will  also  be 
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considered.  However,  they  appear  to  be  less  appealing  since  with  each  new  observation  more 
can  be  learnt  about  the  unknown  parameters,  which  in  turn  can  improve  the  basis  for  further 
decisions.  Look  ahead  procedures,  which  have  been  utilized  previously  by  Govindarajulu  and 
Katehakis  (1991)  in  survey  sampling,  and  which  are  described  and  discussed  in  various  other 
settings  in  Berger  (1985),  will  be  discussed  thoroughly  later  in  this  paper. 

Selecting  in  terms  of  the  largest  sample  mean  is  called  in  the  literature  the  natural  selection 
rule.  It  is  the  uniformly  best  permutation  invariant  selection  procedure,  in  the  ffequentist  sense, 
for  a  general  class  of  loss  functions,  as  long  as  the  sample  sizes  are  equal.  However,  for 
unequal  sample  sizes,  the  natural  selection  rule  appears  to  be  less  powerful,  although  it  still 
remains  intuitively  appealing.  In  view  of  this  fact,  optimum  sample  size  allocations  for  the 
natural  selection  rule  have  been  considered  in  the  frequentist  approach  by  Bechhofer  (1969), 
Dudewicz  and  Dalai  (1975),  Bechhofer,  Hayter,  and  Tamhane  (1991),  and  Bechhofer,  Santner, 
and  Goldsman  (1995).  Bayes  selection  rules  under  unequal  sample  sizes  can  have  complicated 
forms  which  may  not  be  represented  in  closed  form,  as  it  has  been  shown  in  Abughalous  and 
Miescke  (1989)  and  Gupta  and  Miescke  (1988).  Bayes  rules  for  more  involved  normal  models 
have  been  studied  by  Berger  and  Deely  (1988)  and  Fong  and  Berger  (1993).  Earlier  ideas  of 
and  results  on  sampling  allocations  for  Bayes  rules  under  normality  and  the  linear  loss  are  due 
to  Dunnett  (1960). 

An  introduction  to  Bayes  selection  procedure  is  provided  in  Section  2.  Here,  as  well  as  in 
the  remaining  sections,  special  emphasis  is  given  to  two  specific  models  under  the  “0-1  ”  loss 
and  the  linear  loss.  The  first  is  the  normal  case  with  independent  normal  priors  for  the  k 
parameters,  which  is  called  the  normal-normal  model,  and  the  second  is  the  binomial  case  with 
independent  beta  priors,  which  is  called  the  binomial-beta  model.  As  a  first  step  toward  Bayes 
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look-ahead  sequential  sampling  designs,  Bayes  one-  and  two-stage  sampling  designs  are 
studied  in  Section  3.  Finally,  Bayes  look-ahead  sequential  sampling  designs  are  treated  in 
Section  4. 


2.  Bayes  Selection  Procedures 

Let  P,,...,Pk  belong  to  a  one  parameter  exponential  family  {Fe}  ,0  e  Q  c  91,  where  P,  is 
associated  with  a  certain  parameter  0,  eQ,  i  =  l,...,k,  but  where  the  values  of  0 0k  are 
unknown.  Let  the  goal  be  to  find  that  population  which  has  the  largest  parameter  value. 
Special  emphasis  will  be  given  to  the  normal  family  {N(0,o2)} ,  0  efi  =  SR,  with  a2  >  0 
known,  and  to  the  binomial  family  (B(n,0)}  ,060  =  [0,1].  Let  X,,...,Xk  denote  sufficient 
statistics  from  independent  random  samples  of  sizes  nlv..,  nk  ffomP1,...,Pk,  respectively.  Since 

Bayes  selection  procedures  are  the  topic,  only  non-randomized  decision  rules  need  to  be 
considered  in  the  following.  These  can  be  represented  as  measurable  functions  d(x)  with 
values  in  {l,...,k},  where  x  =  (xlv..,xk)  are  the  observed  values  of  X  =  (X„...,Xk).  In  the 
decision  theoretic  approach,  let  L(0,i)  be  the  loss  for  selecting  population  P.  at 
0  =  (0,,...,0k),  with  L(0,i) <L(0, j)  for  0;>0j,  i,j  =  l,...,k.  Later  on,  emphasis  will  be 

given  to  two  special  loss  functions,  the  “0-1  ”  loss  and  the  linear  loss,  which  are  defined  by 

L(0,i)  =  1-  "0-1"  loss,  (1) 

L(0,i)  =  0[k]-0i,  linear  loss, 

where  0[k]  -max{Qx . 0k},  i  =  1 . k . 
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Finally,  the  Bayesian  component  is  added  to  the  problem.  Here  the  parameters 
0  =  ,,0k)  are  assumed  to  be  a  priori  independent  random  variables  which  follow  a  prior 

distribution  with  a  known  density  tc(0)  =  71,(0, )x...x7Ck(0k).  For  the  normal  family,  normal 

priors  0;  ~  N^.v”1)  with  p,  eiR.v,  >0  will  be  assumed,  and  for  the  binomial  family,  beta 

priors  0,  ~  Beta^.p,),  with  >0,  i  =  .  The  ffequentist  risk  of  a  selection  rule  d 

at  0  =  0  is  given  by  R(0,d)  =  Ee(L(0,d(X)),  and  its  Bayes  risk  by  r(7i,d)  =  E’t(R(0,d)). 

The  latter  is  minimized  by  every  Bayes  rule,  and  this  minimum  r(7t),  say,  is  called  the  Bayes 

risk  of  the  problem.  The  Bayes  risk  of  a  rule  d  can  be  represented  in  two  ways  as  follows. 

r(7t,d)=E(L(0,d(X)))  (2) 

=  E”  ( E{L(0,  d(X))  |  ©}) 

=  Em(  E{L(@,  d(X))  |  X}), 

where  m  denotes  the  marginal  density  or  discrete  probability  function  of  X.  The  standard  way 
of  determining  a  Bayes  rule  dB,  say,  is  to  minimize,  at  every  X  =  x ,  the  posterior  expected 
loss,  i.e.  the  posterior  Bayes  risk  E{L(©,d(x))|X  =  x} .  Depending  on  the  type  of  loss 
function,  one  arrives  at  the  following  criteria. 

E{L(0,  dB  (x))  |  X  =  x}  =  m  E{L(0,  i)  |  X  =  x}  in  general,  (3) 

=  1  -  maxk P{0;  =  0[k]  |  X  =  x}  for  "l 0  - 1 "  loss, 

=  E{0M|X  =  x}  -woxE{©i|X  =  x}  for  linear  loss. 

Apparently,  under  the  linear  loss,  E{0[k]|X  =  x}does  not  need  to  be  considered  for  finding  a 
Bayes  rule.  However,  it  is  relevant  for  the  evaluation  of  the  Bayes  risk  r(7t) ,  and  thus  it  will  be 
relevant  for  the  Bayes  designs  to  be  considered  in  the  subsequent  sections.  For  the  two  special 
loss  functions,  the  quantities  to  be  minimized  or  maximized  in  (3)  can  be  represented  by 
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in  general. 


(4) 


E{L(0,i)|X  =  x}  =  jL(e,i)7i(0|x)d0, 

<Rk 

P{0.  =©[k]|X  =  x}  =  jnf  J^jCOjlXj)  d0j  TCiCejjxj)  de;  ,  for  "0-1"  loss, 

9?  j*i  \  —oo  J 

E{0;  |X  =  x}  =  J ©i  7ci(ei|xi)d0i  ,  for  linear  loss, 

SR 

where  7t(0|x)  is  the  posterior  density  of  0,  given  X  =  x,  and  7tr(0r|xr)  is  the  posterior 
marginal  density  of  0r,  given  X  =  x,  r  =  .  In  the  second  and  third  equation  of  (4),  the 

fact  that  7t(0|x)  =  7t1(01|x1)x...x7ik(0t|xk)  is  utilized.  This  means  that  a  posteriori,  ©p...,©,, 
are  not  only  independent,  but  that  the  posterior  distribution  of  0r ,  given  X  =  x,  depends  on  x 


only  through  xr,  r  =  . 

In  the  remainder  of  this  section,  Bayes  rules  for  the  normal-normal  and  the  binomial-beta 
model  will  be  studied  under  the  “0-1”  loss  and  the  linear  loss.  Because  of  the  inherent 


independence,  only  the  distributions  associated  with  each  individual  i-th  of  the  k  populations, 
i  e  k},  have  to  be  specified.  For  the  normal-normal  model  one  has  the  following. 


Normal  -  Normal:  X;|0  =  0  ~  n(©j,  pk  x) 


0;  ~  N^pV'1) 


0;|X  =  X  ~  N 


pipi+^L>(pi+vi)-1J  ’ Xi  ~  4W + vf1) 


where  X;  is  the  sample  mean  of  the  i-th  population,  and  p(  =  n(a~2  is  its  precision.  The  Bayes 
selection  rules  under  “0-1”  loss  and  linear  loss  can  now  be  seen,  in  view  of  (3),  (4),  and  (5), 
to  be  dB(x)  =  i0,  if  for  i  =  l,...,k,  the  following  respective  quantity  is  maximized  at  i  =  i0 . 

P{0i  =0[k]jX  =  x}  =  Jn<l>j(e|xj)dOi(0|xi)  ,  for  ”0-1"  loss,  (6) 

E{0- 1 X  =  x}  =  PiXi  +  Vi^’-  for  linear  loss, 

Pi+Vi 
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where  <l>r(-|xr)  is  the  c.d.f.  of  the  conditional  distribution  of  0r,  given  X  -  x  ,  r-l,...,k, 
which  is  represented  in  (5). 

At  this  point  it  should  be  mentioned  that  there  is  a  natural  selection  rule  dN ,  say,  which 
selects  in  terms  of  the  largest  of  the  sample  means  X,,...,Xk .  In  terms  of  the  frequentist  risk, 
and  for  a  large  class  of  loss  functions,  including  the  “0-1"  loss  and  the  linear  loss,  it  is  the 
uniformly  best  permutation  invariant  selection  procedure,  if  the  sample  sizes  n1,...,nkare  all 
equal.  More  generally,  an  analogous  fact  holds  for  monotone  likelihood  ratio  families.  The 
history  of  its  proofs,  one  of  which  is  in  the  Bayesian  approach  utilizing  permutation  invariant 
priors,  can  be  found  in  Gupta  and  Miescke  (1984b).  For  the  present  situation,  where  n„...,nk 

may  not  be  equal,  no  optimum  properties  of  dN  under  the  “0-1  loss  are  known,  except 
admissibility,  which  has  been  proved  only  recently  in  Miescke  and  Park  (1997b).  Gupta  and 
Miescke  (1988)  have  shown  that  dN  is  minimax  under  the  “0-1  ”  loss  if  and  only  if 
rij  =...=  nk.  Here  the  minimax  value  of  the  problem  is  1-1/k,  which  can  be  proved  with  a 

suitable  sequence  of  independent  normal  priors. 

An  undesirable  property  of  dN  under  the  “0-1"  loss  was  first  discovered  (Lam  and  Chiu, 
1976,  Tong  and  Wetzell,  1979)  by  noting  that  the  frequentist  risk  of  dN  is  not  always 
increasing  in  each  of  the  sample  sizes  n,,...,nk .  On  the  other  hand,  under  the  linear  loss,  d 
is  a  proper  Bayes  rule  and,  because  of  its  uniqueness,  also  admissible  (Berger,  1985).  This  can 
be  readily  seen  (Gupta  and  Miescke,  1988)  from  (6),  by  letting  Pr  =  P  and  vr  ~  c  Pr  > 
r  =  l,...,k,  for  some  fixed  real  p  and  a  positive  c.  Properties  of  Bayes  rules  for  other  priors 
are  also  discussed  there,  as  well  as  in  Berger  and  Deely  (1988)  and  Fong  and  Berger  (1993). 
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Turning  now  to  the  binomial-beta  model,  the  situation  regarding  the  distributions  presents 
itself  as  follows: 

Binomial -Beta:  XJ®  =  0  ~  ,  ©i  ~  Beta(ai,(Bi) ,  (7) 

©JX  =  x  ~  Beta(aj  +n; -X;)  ,  X;  ~  PEOvapPpl) , 

where  a; ,  P;  >  0 ,  i  =  1, . . . ,  k .  Here  the  unconditional  marginal  distribution  of  X,,  i  =  1, . . . ,  k ,  is 

a  Polya-Eggenberger  type  distribution  (Johnson  and  Kotz,  1969),  sometimes  called  beta- 

binomial  distribution,  which  in  the  present  situation  turns  out  to  be 


pnr  _  (V |  rfo+Pi)  rCai+Xj^CPi+nj-Xj) 

P  Xt  “  Xj}  “  y-1  /-  \  \  T“»/  Q  \  3  (oj 

w  rca^rcPi)  rccti+Pi+ni) 


The  Bayes  selection  rules  under  “0-1  ”  loss  and  linear  loss  can  be  seen,  in  view  of  (3),  (4), 
and  (7),  to  be  dB(x)  =  i0,  if  for  i  =  l,...,k,  the  following  is  maximized  at  i  =  i0. 

P{0,=©m|X  =  l}=  JnFj(0lxi)<ni(0|X|)  .  for "0-1"  loss,  (9) 


[0,1]  J*1 


E{0;  |  X  =  x}  = 


ai  +xi 


ai  +Pi  +ni 


for  linear  loss, 


where  Fr(-|xr)  is  the  c.d.f.  of  the  conditional  distribution  of  0r,  given  X  =  x  ,  r  -  l,...,k , 
which  is  represented  in  (7). 

At  this  point,  again,  the  natural  selection  rule  dN,  which  selects  in  terms  of  the  largest  of 
the  sample  means  X,  /nl  ,...,Xk  /nk,  has  to  be  discussed.  Since  the  binomial  family  is  also  an 

exponential  family,  dNis  also  here  uniformly  best  invariant  selection  rule  if  and  only  if  the 
sample  sizes  are  equal.  Likewise,  Abughalous  and  Miescke  (1989)  have  shown  that  under  the 
“0-1”  loss,  minimaxity  of  dN holds  if  and  only  if  n,  =...=  nk .  As  in  the  normal  case,  the 
minimax  value  of  the  problem  is  1-1/k,  which  can  be  proved  with  a  suitable  sequence  of 
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independent  Beta  priors.  Other  results  regarding  dN  turn  out  to  be  different  from  their 
counterparts  in  the  normal  case.  For  example,  dN  is  not  a  proper  Bayes  rule  here.  Properties 
of  Bayes  rules,  for  various  priors,  under  the  “0-1  ”  loss  and  the  linear  loss  have  been  studied  in 
the  same  paper.  Questions  regarding  the  optimality  of  dN  for  unequal  sample  sizes  n,,...,nk, 
in  the  present  setting,  have  been  addressed  in  Bratcher  and  Bland  (1975),  Risko  (1985),  and  in 
Abughalous  and  Miescke  (1989). 

There  are  two  other  types  of  Bayes  selection  procedures,  which  are  based  on  different 
goals  or  philosophies.  These  will  be  described  briefly  at  the  end  of  this  section.  The  first  is 
within  the  subset  selection  approach,  which  is  due  to  Gupta  (1956,  1965).  Here  the  goal  is  to 
select  a  non-empty  subset  sc{l,...,k},  of  preferably  small  size,  which  contains  the  best 
population.  This  goal  can  be  represented  in  various  ways  by  means  of  the  loss  function.  The 
first  paper  within  this  framework  is  due  to  Deely  and  Gupta  (1968),  which  deals  with  the  linear 
loss  function  L(0,s)  =  2.gsasi(0[k]-0i).  The  loss  function  L(0,s)  =  Xiss( a-bI<e[k]}(ei) ) 

has  been  used  by  Bratcher  and  Bhalla  (1974)  and  Gupta  and  Hsu  (1977),  the  loss  function 
L(0,s)  =  c|s|  +  0(k]  -maxl£sQi  has  been  used  by  Goel  and  Rubin  (1977),  and  the  additive  loss 

function  L(0, s)  =  7,(0) ,  with  emphasis  on  the  special  case  L(0,  s)  =  ^ gs (0[k)  - 0;  - e) , 

has  been  used  by  Miescke  (1979).  Another,  non-additive,  loss  fiinction  has  been  used  in 
Chemoff  and  Yahav  (1977).  Further  references  in  this  regard  can  be  found  in  Gupta  and 
Panchapakesan  (1979). 

The  other  type  of  selection  procedures  combines  selection  of  the  best  population  with  the 
estimation  of  the  parameter  of  the  selected  population.  The  decision  rules  are  now  of  the  form 
5(x)  =  (d(x),edW(x)),  where  d(x)  e(l,...,k}  is  the  selection  rule,  and  e^x),  i  =  l . k,  is  a 
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collection  of  estimates  for  0,  ,  i  =  l,...,k,  which  are  available  after  selection.  As  it  is  shown  in 
Cohen  and  Sackrowitz  (1988)  and  Gupta  and  Miescke  (1990),  the  decision  theoretic  treatment 
of  the  combined  selection  and  estimation  problem  consists  of  two  steps  of  optimization.  First, 
the  possible  estimates  are  determined,  which  turn  out  to  be  the  usual  Bayes  estimates  of  the 
related  problem  of  estimation  without  considering  selection.  Then,  after  knowing  the  available 
estimates,  the  optimum  selection  is  made.  Detailed  results  for  the  normal  case,  under  the 
additive  loss  function  L(0,5)  =  A(e,d)  +  B(0d,ed) ,  and  various  special  cases,  are  presented  in 
Gupta  and  Miescke  (1990).  Here  A(0,d)  is  the  loss  due  to  selecting  population  Pd,  and 
B(0d,ed)  is  the  loss  of  estimating  0d  with  ed,  d  =  l,...,k.  Similar  work  for  the  binomial  case 
has  been  done  by  Gupta  and  Miescke  (1993).  In  both  papers,  overviews  of  work  in  this 
direction  and  further  references  can  be  found. 

3.  Bayes  One-  and  Two-Stage  Sampling  Designs 

Starting  with  the  situation  described  in  the  previous  section  up  to  (3),  let  us  now  consider 
a  fixed  total  sample  size  allocation  problem.  Suppose  that  in  the  planning  stage  of  the 
experiment,  a  total  of  n,+...+nk  =  n  observations  are  planned  to  be  drawn  from  the  respective 
populations.  Since  after  every  observed  X  =  x,  the  posterior  Bayes  risk  will  be  equal  to 
min^  k  E{L(0,i)|X  =  x} ,  the  optimum  allocation  of  n,,...,nk,  i.e.  the  Bayes  design,  is  given 

by  the  following  criterion: 

min  E  {min  E{L(0,i)|X}) ,  in  general,  (10) 

nl+.*.+nk  =  n  i— 1.— >k 

max  E(  max  P{0;  =  0M|X}) ,  for  "0-1"  loss, 

n1+...+nk— n  i— 

max  E (  max  E(0;|X}) ,  for  linear  loss, 
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where  the  outer  expectation  is  with  respect  to  the  unconditional  marginal  distribution  of  X.  For 
the  normal-normal  model  and  the  binomial-beta  model,  its  representation  is  given  in  (5)  and  in 
(7)  and  (8),  respectively.  The  inner  conditional  expectations  and  probabilities  in  (10)  are 
represented  in  general  form  in  (4),  and  in  their  special  forms  for  the  normal-normal  model  and 
the  binomial-beta  model  in  (6)  and  (9),  respectively. 

In  the  first  part  of  this  section,  Bayes  designs  for  the  two  special  models  will  be  studied 
under  the  “0-1  ”  loss  and  the  linear  loss.  For  the  normal-normal  model,  the  Bayes  designs 
consist  of  those  sample  sizes  n,,...,nk  which  achieve  the  following  maximum. 


f  ri-r  (5ie  +  ni(Xi)-ni(Xi)>|  1 

max  E  max  - ~ -  (p(0)d0  , 


nj+^+n^n  \i=l,-.,k  j*i  \ 

J  PiXj+v^ 
max  E  max - 

n,  +...+n.  =n  \i=l„,k  Pi 


for  "0-1”  loss,  (11) 
for  linear  loss, 


where  in  the  first  criterion,  O  and  cp  denote  the  c.d.f.  and  density,  respectively,  of  N(0,1),  and 


where  5r  =  (pr +  vr)"1/2,  pr(Xr)  =  (prXr  +  vr|ar)/(pr  +  vr),  and  pr  =  nrcT2,  r  =  l,...k,  for 
brevity.  The  outer  expectations  are  with  respect  to  the  marginally  independent  random  variable 
X^NGi^+v;1),  r  =  l,...k . 

The  special  case  of  k  =  2  populations  has  been  completely  analyzed  in  Gupta  and  Miescke 
(1994).  For  both  loss  functions  it  has  been  shown  there,  using  different  techniques,  that  the 
Bayes  design  is  determined  by  minimizing  |pt  +  vt  -p2  —  v2|,  the  absolute  difference  between 
the  two  posterior  precisions.  It  is  zero  if  and  only  if  the  joint  posterior  distribution  of  0,  and 
@2  is  decreasing  in  transposition,  a  situation  in  which  the  Bayes  selection  rules  are  of  simple 
forms  (Gupta  and  Miescke,  1988).  The  results  for  k  =  2  mentioned  above  cany  over  to  k  >  3 
populations  to  some  extend.  This  has  been  shown  in  Gupta  and  Miescke  (1996a)  for  the  linear 
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loss,  mainly  for  the  special  case  of  n  =  1.  The  latter  plays  an  important  role  in  optimum 
sequential  allocations,  which  will  be  discussed  in  more  details  in  the  next  section. 

For  the  binomial-beta  model,  the  Bayes  designs  consists  of  those  sample  sizes 
n1,...,nk  which  achieve  the  following  maximum. 

f  \ 


max  iriHi.x,(0)hi.x1(e)de 


max  E| 

j+...+nk=n  \  i=l . k  3?  j*' 


r 


max  E| 

“k 


max 


a^X, 


i=i.....k  a^P.+n,  ) 


for  ”0-1"  loss, 

for  linear  loss, 


(12) 


where  in  the  first  criterion,  Hr  Xr  and 


h  denote  the  c.d.f.  and  the  density,  respectively,  of 

r,xr 


Beta(ar  +xr,  Pr  +nr  -xr),  r  =  for  brevity.  The  outer  expectations  are  with  respect  to 

the  marginally  independent  random  variables  Xr  ~  PE(nr,ar,pr,l),  the  Polya-Eggenberger 
distribution  given  by  (8) ,  r  =  1,..., k  . 

In  the  second  part  of  this  section,  the  one-stage  model  considered  above  will  be  extended 
to  a  two-stage  model,  which  can  be  summarized,  after  a  standard  reduction  by  sufficiency,  as 


follows.  At  6  eQk,  let  X;  and  Y,  be  sufficient  statistics  of  samples  of  sizes  n;  and  m,  from 


population  P;  at  Stage  1  and  Stage  2,  respectively,  which  altogether  are  independent.  It  is 
assumed  that  sampling  at  Stage  1  has  been  completed  already,  where  X  =  (x,,...,xk)  has  been 
observed,  and  that  it  is  planned  to  allocate  observations  Y  =  (Yj,...,  Yk)  for  Stage  2,  subject  to 
m!+...+mk  =  m ,  where  m  is  fixed  given. 

First,  let  us  consider  the  situation  at  the  end  of  Stage  2,  where  both,  X  =  x  and  Y  =  y 
have  been  observed.  From  (2)  and  (3)  it  follows  that  every  Bayes  selection  rule  dB(x,y) ,  say, 
is  determined  by 
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E{L(0,dB(x,y))|X  =  x,  Y  =  y}  =  min  E{L(©,i)|X  =  x,  Y  =  y}.  (13) 

Here  X  and  Y  are  not  combined  into  an  overall  sufficient  statistic,  since  the  situation  at  the  end 
of  Stage  1  will  be  studied  now.  The  criterion  for  allocating  observations  Y  for  Stage  2,  after 
having  observed  X  =  x  at  Stage  1,  is  to  find  mp...,mk,  subject  to  the  side  condition 
m[+...+mk  =  m ,  for  which  the  following  minimum  is  achieved. 

min  E{  E{L(0,dB(x,Y))|X  =  x,Y}  |  X  =  x}  (14) 

mI+...+mjc=m 

=  min  E{  min  E{L(@,  i)  |  X  =  x,  Y}  |  X  =  x}, 

m[+-.+mk=m  i=l,.-,k 

where  the  outer  expectation  is  with  respect  to  the  conditional  distribution  of  Y,  given  X  =  x .  It 
should  be  pointed  out  that  in  (13)  and  (14),  dB  does  not  only  depend  on  n1,...,nk,  which  are 
fixed  here,  but  also  on  m1,...,mk,  which  are  varying,  since  every  design  of  specific  n’s  and  m’s 
has  its  own  Bayes  selection  rules. 

From  now  on  it  is  assumed  that  Stage  1  has  been  completed  already,  i.e.  that  X  =  x  has 
been  observed,  and  that  a  Bayes  design  for  Stage  2  with  m,+...+mk  =  m  has  to  be  determined. 
In  this  situation,  it  proves  to  be  convenient  to  update  the  prior  with  the  information  provided 
by  X  =  x  (Berger,  1985),  i.e.  to  treat  Stage  2  with  observations  Y  as  a  first  stage  and  to  use 
the  updated  prior  density  7t(0 1  x)  as  a  prior  density.  The  Bayes  designs  are  then  all  sampling 
allocations  m„...,mk  for  which  the  following  minimum  or  maximum,  respectively,  is  achieved. 

min  Ex  (  min  Ex  {!/©,  i)  |  Y})  in  general,  (15) 

m1+...+mk=m  i=lr..,k 

max  Ex  ( max  Px  {0;  =  ©M  |  Y»  for  "0  - 1 "  loss, 

ni|  +.-+mk — m  i— lr..,k 

max  E x  ( max  Ex  {©JY})  for  linear  loss, 

mj+...+mk=m  i=l„..,k 

where  here  and  in  the  following,  the  subscript  x  at  all  probabilities  and  expectations  indicates 
that  the  updated  prior,  based  on  X  =  x  ,  is  used  as  prior.  Comparing  now  (15)  with  (10),  one 
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can  see  that  the  design  problem  for  Stage  2,  posed  at  the  end  of  Stage  1,  can  be  considered  as 
a  one-stage  design  problem  and  treated  as  such. 

On  the  other  hand,  if  one  prefers  not  to  update  the  prior,  then  in  (15),  the  inner 
operations  Ex  andPx  are  with  respect  to  the  conditional  distribution  of  0 ,  given  X  =  x  and  Y, 
and  the  outer  operations  Ex  are  with  respect  to  the  conditional  distribution  of  Y,  given  X  =  x 

Both  approaches  are  valid,  equivalent,  and  lead  to  the  same  results. 

To  conclude  this  section,  Bayes  designs  for  the  two  special  models  will  be  studied  under 
the  “0-1  ”  loss  and  the  linear  loss.  For  the  normal-normal  model,  it  can  be  shown  (Gupta  and 
Miescke  1994)  that  the  Bayes  designs  consist  of  those  sample  sizes  m,,...,mk  which  achieve 


the  following  respective  maximum. 


max  E| 

mj  -K..+mk=m 


max 

i=l _ k 


J  ri  <$(&  J1  [&iZ + p.;  (X; )  -  JJ.j  fcj) + YiNi  -  Y  jN  j])cp(z)dz 

3?  j*i  ' 


(16) 


max 

+-+mka 


max 

i=l . k 


[pi(xi)  +  yiNi]  )  , 


for  "0-1"  loss, 
for  linear  loss, 


where  &r  =  (pr  +qr  +  Vr)  1/2,  Yr  _  ^r(Pr  +  vr)  >  Mr(xr)  -  (PrXr  +  VrM'r)//(Pr  +  Vr)  > 

with  pr  =nra'2and  qr  =  mra'2,  r  =  l,...,k,  and  N,,...,Nk  are  generic  independent  N(0,1) 
random  variables. 

The  results  for  the  special  case  of  k  =  2  are  analogous  to  the  one-stage  Bayes  design 
mentioned  earlier.  Results  for  the  case  of  k>3  are  only  known  for  the  linear  loss.  They  are 
based,  in  view  of  (16),  on  the  properties  of  B(maxl=l  k{ki  +xiNi})  as  a  function  of 
jl  s9?and  >  0,  i  =  l,...,k.  These  properties  have  been  derived  in  Gupta  and  Miescke 
(1996a)  using  the  auxiliary  function 
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T(w)  =  w  <f>(w)  +  (p(w)  =  j_w  0(v)  dv  ,  w  s  9?  . 


(17) 


Here  the  special  case  of  m  =  1  has  been  worked  out  in  details,  which  is  relevant  for  the  Bayes 
sequential  allocations  to  be  considered  in  Section  4.  Discussion  of  further  results  in  this  respect 
will  thus  be  postponed  until  Section  4. 

For  the  binomial-beta  model,  the  Bayes  designs  for  Stage  2  consist  of  those  sample  sizes 
mI,...,mkfor  which,  subject  to  m,+...+mk  =  m ,  the  following  maximums  are  achieved. 


'N 

max  Z\  max  lriHj.xJyj(e)hi,Xjyi(0)d0  P,{Y  =  y},  for "0-7”  loss,  (18) 


max 


max 


aj+Yi 


P*{Y  =  y}, 


for  linear  loss, 


.i=i . k  as  +bj  +m;y 

where  in  the  first  criterion,  Hr,Xr>yr  and  hr  Xr  Yr  denote  the  c.d.f.  and  the  density,  respectively,  of 
Beta(ar+yr,br+mr-yr),  with  ar=ar+xr  and  br  =(3r +nr -xr,  r  =  l,...,k,  for  brevity. 
The  sums  in  (18)  are  expectations  with  respect  to  the  conditional  distribution  of  Y,  given 
X  =  x ,  which,  analogously  to  (8),  are  given  by 


P,(Y  =  y)  =  fl 

i=l 


m- 


r(ai+bj)  r(ai+y,)r(bi+mi-yi) 


y J  r(ai)r(b;)  r( ai+bi+mi> 


(19) 


where  y;  =  0,l,...,mi  ,  i  =  l,...,k. 

No  further  results  are  known  in  this  situation  under  the  “0-1  ”  loss.  However,  under  the 
linear  loss,  interesting  theoretical  as  well  as  numerical  results  have  been  found  by  Gupta  and 
Miescke  (1996b)  and  Miescke  and  Park  (1997a).  These  are  relevant  for  the  Bayes  sequential 
allocations  and  will  be  discussed  in  the  next  section. 

To  conclude  this  section,  some  comments  will  be  made  regarding  cost  of  sampling. 
Suppose  that  every  single  observation  costs  a  certain  amount  X ,  say.  Then  (10)  and  (15)  have 
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to  be  compared  with  the  respective  cost  of  sampling  nA  and  mA .  If  the  cost  of  sampling  turns 
out  to  be  larger  that  the  minimum  posterior  expectation  given  in  (10)  or  (15),  respectively, 
then  apparently  it  is  not  worth  taking  all  of  these  observations.  This  approach  has  been  treated 
by  Gupta  and  Miescke  (1993),  within  the  problem  of  combined  selection  and  estimation,  in  the 
binomial-beta  model.  It  leads,  among  other  considerations,  to  finding  the  largest  sample  size, 
subject  to  its  given  upper  bound,  which  is  worth  allocating  to  incur  a  gain.  In  the  next  section, 
where  observations  are  allocated  and  taken  in  a  sequential  fashion,  cost  of  sampling  would 
require  the  incorporation  of  a  stopping  rule.  However,  this  will  not  be  done  there  to  keep  the 
presentation  of  basic  ideas  simple.  Modifications  of  these  sequential  allocation  rules  to  this 
more  general  setting  are  straightforward,  but  more  involved.  Therefore,  cost  of  sampling  will 
not  be  considered  any  further  from  now  on. 

4.  Bayes  Look-Ahead  Sequential  Sampling  Designs 

From  now  on  it  is  assumed  that  Stage  1  has  been  completed,  i.e.  that  X  =  x  has  been 
observed  already,  and  that  m  additional  observations  Y  =  (Y,,...,  Yk)  are  planned  to  be  drawn 
at  Stage  2.  The  optimum  allocations  of  sample  sizes  mj,...,mk,  i.e.  the  Bayes  designs,  are 

determined  by  criterion  (15),  i.e.  by 

min  Ex  (  min  Ex  {L(0,  i)  |  Y}) .  (20) 

m1+.~+m|C=m  i=l . k 

The  first  step  toward  sequential  Bayes  designs  is  to  consider  an  intermediate  step  of 
Stage  2  sampling,  where  so  far  only  Y  =  y  has  been  observed,  with  m;  observations  from 
population  P;  ,  i  =  l,...,k,  where  m,  +...+mk  =  m,  and  where  m  with  1  <  m  <  m  is  fixed.  The 
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A  ~  A  ~ 

best  allocation  of  the  remaining  m  =  m-m  observations  with  m,  =  nr  -  m,  >  0,  i  =  1, .... k , 
achieves 


min  Ex  \  min  Ex  {L(0,  i)  |  Y  =  y,  Y} 

A  A  a  x 

mi  +...+  — m 


Y  =  y  , 


(21) 


where  the  outer  expectation  is  with  respect  to  the  conditional  distribution  of  the  new 


A  ~ 

observations  Y ,  say,  given  Y  =  y  (and  X  =  x ). 

Returning  now  to  the  end  of  Stage  1,  the  optimum  two-step  allocation  for  drawing  first  m 


and  then  m  =  m-  m  observations  at  Stage  2  is  found  by  backward  optimization.  First  one  has 
to  consider  every  possible  sample  size  configuration  m1,...,mk  and  every  possible  outcome 


Y  =  y .  For  each  such  setting,  one  allocation  mi(y,m1,...,mk),  i  =  l,...,k,  has  to  be  found 
which  achieves  (21).  Then  one  has  to  find  an  allocation  mj,...,mk  which  achieves 


min  Ex 


min  E  \  min  E  (L(0,  i)  |  Y,  Y} 

.  *  "  *  i=l„.k  1 

mI-K..+  m1=m  V  m1+.-+m1=m 


(22) 


Here  one  should  be  aware  of  the  fact  that  the  information  contained  in  Y  is  the  combined 


information  gained  from  Y  and  Y .  It  should  also  be  pointed  out  clearly  that  in  the  middle 

A  A  ~  ~  ~ 

minimization  operation  of  (22),  mi,...,mk  depend  on  m . .  and  on  Y,  and  thus  they  are 

random  variables  themselves!  This  is  the  very  reason  why  (22)  can  be  handled  numerically  and 
in  computer  simulations  in  the  binomial-beta  model,  where  Y  is  discrete  and  assumes  only 
finitely  many  values,  but  not  in  the  normal-normal  model. 

Comparing  now  (20)  with  (22),  one  can  show  that  the  latter  must  be  less  than  or  equal  to 
the  former.  If  one  deletes  in  (22)  the  minimum  to  the  right  of  the  first  expectation,  and  inserts  a 
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minimum  to  the  left  of  the  same,  subject  to  mi  =  m;  -mk  >  0,  and  subject  to  mi  +.  ..+  nik  —  m, 
then  the  resulting  value  cannot  be  smaller.  Combining  now  the  two  iterated  minimization 
operations  into  one  leads  to  (20).  To  summarize,  one  can  state  (Miescke  and  Park,  1997a)  the 
following  result. 

Theorem  1.  For  fixed  m  and  m<m,  the  best  allocation  for  drawing  first  m  and  then 

m  =  m-m  observations  at  Stage  2  is  at  least  as  good  as  the  best  allocation  of  all  m 
observations  in  one  step,  in  the  sense  that  the  posterior  Bayes  risk  (22)  of  the  former  is  not 
larger  than  that  one  of  the  latter ,  given  by  (20).  This  process  of  stepwise  optimum  allocation 
can  be  iterated  for  further  improvements.  The  overall  best  allocation  scheme  is  to  draw,  in  m 
steps,  one  observation  at  a  time,  which  are  determined  by  backward  optimization. 

In  view  of  this  theorem,  several  reasonable  sampling  allocation  schemes  can  be 
constructed  which  utilize  information  gained  from  observations  at  previous  steps.  Let  Rt  ,  for 
t  <  m  denote  the  allocation  of  t  observations  determined  by  (15),  with  m  replaced  by  t  there. 
Moreover,  let  Ru  allocate  any  single  observation  to  one  of  the  populations  sampled  by  R,.  In 

a  similar  way  let  R*u  allocate  one  observation  to  one  of  the  populations  to  which  Rt  assigns 
the  largest  allocation.  Finally,  denote  by  B,  the  optimum  allocation  of  one  observation, 
knowing  all  future  allocation  strategies.  It  should  be  pointed  out  that,  unlike  the  other 
allocations  considered  above,  B,  is  not  a  stand-alone  procedure,  since  it  requires  the 

knowledge  of  what  will  be  done  after  its  has  been  applied. 

Using  these  three  types  of  intermediate  allocation  rules,  the  following  schemes  of 
allocating  m  observations  are  possible.  (Rm)  allocates  all  m  observations  at  once,  using  (15). 
This  fixed  sample  size  m  Bayes  design  will  be  denoted  by  OPT  in  the  following.  A  better 
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allocation  scheme,  in  terms  of  the  Bayes  risk,  is  ( Rn„Rm. ,)  ,  which  uses  i?m/for  the  first 
allocation,  and  then  Rm_,  for  the  rest.  Better  than  (Rnl,Rm.,),  of  course,  is  (B,,RmJ,  which 
uses  backward  optimization  B ,  for  the  first  allocation,  knowing  that  Rm.t  will  be  used  for 
allocating  the  remaining  m  -  1  observations  in  one  step.  In  this  fashion,  similar  and  also  more 
complicated  allocation  schemes  can  be  constructed  (Gupta  and  Miescke  1996a),  which  are 
linked  through  a  partial  ordering  in  terms  of  their  Bayes  risks.  Such  constructions  are 
motivated  by  the  fact  that  the  overall  optimum  allocation  scheme  (B ltB B 1,RI),  denoted 
by  BCK,  is  not  practicable,  except  for  small  m  and  k,  up  to  about  m  =  20  for  k  =  3 ,  in  the 
binomial-beta  model.  For  this  model,  the  allocation  scheme  APP,  say,  which  is 
(R\I,R,m.u,---,R\iRi) ,  appears  to  be  a  very  good  approximation  to  BCK  under  the  linear 

loss.  This  will  be  justified  at  the  end  of  this  section. 

The  allocation  scheme  (R^R,,...^,)  allocates  in  m  steps  one  observation  at  a  time, 
using  R, ,  pretending  that  it  would  be  the  last  one  before  making  the  final  (selection)  decision. 
It  looks  ahead  one  observation  at  a  time  (Berger,  1985,  Amster,  1963)  and  will  be  henceforth 
denoted  by  LAH.  It  should  not  be  confused  with  the  allocation  scheme  SOA,  say,  which 
allocates  in  m  steps  one  observation  at  a  time,  using  the  “state  of  the  art”.  To  be  more  specific, 

suppose  that  Y  =  y  has  been  drawn  so  far.  Then  SOA  allocates  the  next  observation  to  any 
one  of  those  populations  which  are  associated  with  the  minimum  of  the  k  values  of 

Ex{L(0,i)  |  Y  =  y} ,  i  =  l,...,k.  Two  other  allocation  schemes,  which  will  be  considered  later 
in  the  simulation  study  of  the  binomial-beta  model,  should  be  mentioned  here.  The  first  assigns 
one  observation  at  a  time,  each  purely  at  random,  regardless  of  the  previous  observations,  and 
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is  denoted  by  RAN.  The  second  assigns  m/k  observations  to  each  populations,  provided  that 
m  is  divisible  by  k,  and  is  denoted  by  EQL. 

Theoretical  results  for  allocation  scheme  LAH  in  the  normal-normal  model  and  under  the 
linear  loss,  which  are  presented  in  Gupta  and  Miescke  (1994,1996a),  will  now  be  discussed. 
Here,  it  is  sufficient  to  consider  the  first  allocation R,  in  (RI,Rl,...,R1) ,  which  is  based  on 
X  =  x .  All  consecutive  allocations  R ;  are  decided  analogously,  based  on  X  =  x  and  the 

observations  Y  =  y  that  have  been  taken  so  far  at  Stage  2.  Starting  with  criterion  (16)  for  the 
linear  loss  with  m  =  1 ,  where  exactly  one  of  the  sample  sizes  m, , . . .,  mk  is  equal  to  one,  and 
all  others  are  zero,  i.e.  where  exactly  one  of  qx , . . . ,  qk  is  equal  to  a  2  and  all  others  are  zero, 
this  first  observation  is  taken  from  one  of  the  populations  which  yield 

wax  E^wax|jai(xi)  +  oiNi  ,wax{|aj(xj)}|j  ,  (23) 

=  max  J  (Xj (Xj )  +  o^([max{\)L  (x . )}  -  ^ (x; )] / O; )  k 
i=l . k  ^  J*i  J 

where  ar  =  (pr  +cT2  +  vrr'/2(pr  +  v^cT1 ,  and  pr(xr),  r  =  l,....k,  are  defined  below  of 

(16),  and  the  function  T  is  given  by  (17). 

To  describe  the  properties  of  the  first  allocation  R  7  in  (R ,,  R  v . .. ,  R ,) ,  it  proves  useful  to 
consider  the  ordered  values  p.tl]  (x)  <  (i[2]  (x)  < . . .  <  p.[k]  (x)  of  pr(x)  =  p.r(xr) ,  r  =  1, . . . ,  k .  Let 
p  be  the  population,  and  let  o0)  be  the  standard  deviation,  which  is  associated  with  p(i](x), 
i  =  l,...,k .  Then  one  can  state  the  following  result. 
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Theorem  2.  After  X  =  x  has  been  observed,  the  preferences  of  the  first  allocation  R ,  in 
(R,,R,,...,R1)  areas  follows. 

(I)  If  c(k_1)  <  (=,>)  a(k) ,  then  allocating  to  P(k.1}  is  worse  than  (equivalent  to,  better  than) 
allocating  to  V 

(II)  If  for  1  <  i  <  j  <  k  -  2 ,  a(i)  <  ca) ,  then  allocating  to  P(i)  is  worse  than  allocating  to  Pa) . 
(El)  If  for  1  <  i  <  j  <  k  -  2 ,  a(i)  >  a0 ,  and  a(i)  <(=,>)  a, j5  then  allocating  to  P(i)  is  worse 

than  (equivalent  to,  better  than )  allocating  to  PG>  ,  where  atJ  is  determined  by 

40]  (x)  +  a0)T((n[k]  (x)  -  \xU]  (x))  /  o0)  =  ^i](x)  +  aijl((|i(k](x)-|i[i](x))/aij).  (24) 

(TV)  Let  P(#)  be  a  best  allocation  to  either  P^  or  P(k)  according  to  (I).  Likewise,  let  P(>)  be  a 
best  allocation  to  P(1) ^according  to  (II)  and  (IE).  Then  an  overall  best  allocation 
is  found  by  using  (II)  and  (HI)  with  (i),  (j),  nti](x)  ,and  p0](x)  replaced  by  (•), 
(*),<*..,  \i{.fx),and  n[k_,j(x). 

The  proof  can  be  found  in  Gupta  and  Miescke  (1996a),  along  with  further  comments. 
Moreover,  a  numerical  example  with  real  life  data  is  presented  there,  in  which  also 
comparisons  are  made  with  respect  to  standard  multiple  comparison  procedures,  such  as  the 
Scheffe’s  and  Tukey’s  methods. 

Theoretical  and  numerical  simulation  results  for  the  binomial-beta  model  under  the  linear 
loss  are  presented  in  Gupta  and  Miescke  (1996b)  and  Miescke  and  Park  (1997a).  These  will 
be  discussed  in  the  remainder  of  this  section.  First,  properties  of  LAH,  the  allocation  scheme 

(RI,Rl . RJ ,  will  be  studied.  As  it  has  been  justified  above,  it  suffices  to  consider  the  first 

allocation  7?,  of  it.  Starting  with  criterion  (18)  for  the  linear  loss  with  m  =  1 ,  where  exactly 
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one  of  the  sample  sizes  m,,...,mk  is  equal  to  one,  and  all  others  are  zero,  this  first  observation 
is  taken  from  one  of  the  populations  which  yield  the  following  maximum. 


max  Exi 

i=l,..,k 


max 

i=U.k 


max  i 


ai+Y1 


r  N 

ai 

max< 

a .+  b- 
l  j  jJ 
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where  ar  =ar  +xr  and  br  =  |3r +nr-xr,  r  =  l,...,k,  for  brevity.  To  summarize,  one  can  state 
here  the  following  result. 

Theorem  3.  The  first  allocation  R,  in  (R1,RI,...,R1)  is  made  with  respect  to  one  of  the 
populations  P,,  i  =  l,...,k,  for  which  the  maximum  in  (25)  is  achieved.  All  consecutive 

allocations  of  the  type  R,  are  made  analogously,  with  a.rand  br  updated  to  ar  +xr  +yr  and 


Pr  +nr  -xr  +mr-yr,  respectively,  with  respect  to  the  mr  observations,  represented  by  yr , 

r  =  1,...,  k ,  which  have  been  made  so  far  at  Stage  2. 

The  proof  is  given  in  Gupta  and  Miescke  (1996b),  along  with  further  details  of  the 
behavior  of  this  allocation  scheme.  One  interesting  point,  worth  to  be  mentioned,  is  related  to 
the  fact  that  under  the  linear  loss  (with  no  cost  of  sampling),  the  Bayes  look  ahead  risk  cannot 
increase  when  more  future  observations  are  included.  This  fact  implies  that  the  maximum  in 
(25)  is  always  greater  than  or  equal  to  max +b,)}.  However,  equality  may  occur, 

in  which  case  one  additional  observation  from  any  of  the  populations  would  not  be  worth  to  be 
taken,  from  the  Bayesian  point  of  view.  A  similar  situation  may  arise  at  any  allocation  of  the 
type  R,  in  ( R ,, R R  J ,  and  especially  at  the  last  allocation.  In  the  latter  case,  one  can 
accelerate  the  process  by  stopping  one  observation  short  of  m.  In  the  context  of  sequentially 
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allocating  m  observations  at  Stage  2,  however,  this  is  of  minor  concern  and  will  not  be 
considered  any  further. 

Numerical  results  for  allocation  schemes  EQL  and  OPT,  and  computer  simulation  results 
for  RAN,  SOA,  and  LAH  have  been  presented  for  k  =  3  populations  in  Gupta  and  Miescke 
(1996b).  The  computer  programs  were  written  in  Microsoft  Quick  Basic  Version  4.5,  using 
subroutines  from  Sprott  (1991).  These  results  have  been  extended,  using  Microsoft  Visual 
Basic  Version  4.0,  to  numerical  results  for  the  overall  optimum  allocation  scheme  BCK  in 
Miescke  and  Park  (1997a),  where  it  has  been  also  found  that  APP  appears  to  be  a  very  good 
approximation  to  BCK.  In  summary,  the  performances  of  the  following  allocation  schemes 
have  been  studied,  which  have  been  explained  in  more  details  earlier  in  this  section. 

RAN  Assign  one  observation  at  a  time,  each  purely  at  random. 

EQL  Assign  m  /  3  observations  to  each  population  P, ,  P2 , P3 . 

SOA  Assign  one  observation  at  a  time,  following  the  state  of  the  art. 

LAH  Assign  one  observation  at  a  time,  using  (RI,R1,...,R1). 

OPT  Assign  mt  observations  to  Pt,t  =  1,2,3,  using  (Rm). 

APP  Assign  one  observation  at  a  time,  using 

BCK  Assign  one  observation  at  a  time,  using  backward  optimization. 

In  three  examples,  with  suitably  chosen  values  for  ar  =ar  +xr  and  br  =Pr  +nr  -xr  , 
r  =  1, 2, 3 ,  to  cover  various  interesting  settings,  the  performances  of  these  allocation  schemes 
have  been  compared.  The  values  for  m  considered  have  been  1,  3,  9,  and  15.  For  m  =  1, 
EQL  has  been  set  to  take  its  observation  from  population  P,  ,  rather  than  leaving  the 
respective  spaces  empty  in  the  tables.  As  to  ties,  RAN,  SOA,  LAH,  and  APP  have  been  used, 
and  are  recommended  to  be  used,  with  ties  broken  purely  at  random,  with  equal  probabilities, 
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whenever  they  occur.  This  recommendation  is  corroborated  by  findings  in  the  numerical 
studies. 

Comparing  the  expected  posterior  gains  of  the  first  five  allocation  schemes,  it  turns  out 
that  overall,  LAH  and  OPT  are  performing  similarly  well,  each  sometimes  better  than  the 
other,  but  clearly  better  than  RAN,  EQL,  and  SOA.  The  latter  effect  is  found  to  be  increasing 
in  m.  That  LAH  is  not  always  as  good  as  OPT  proves  that  it  cannot  be  any  version,  i.e.  with 
any  type  of  breaking  ties,  of  the  allocation  scheme  (Rmi,Rm.i,i,---,RziRi)  >  and  thus  in 

particular  it  cannot  be  equal  to  (K,1,K.U,-,R*ZIR1) ,  i.e.  APP,  since  the  latter  two  are 

always  at  least  as  good  as  OPT.  One  advantage  of  LAH,  besides  its  easy  implementation,  is 
that  each  of  its  individual  allocations  is  self-contained.  Thus,  if  in  an  ongoing  experiment  the 
total  number  m  of  observations  has  to  be  changed,  this  has  only  minor  effects  on  its  usage. 

The  numerical  results  for  BCK  in  Miescke  and  Park  (1997a)  became  feasible  with  the 
release  of  Microsoft  Visual  Basic  Version  4.0  ,  which  allows  to  handle,  on  a  typical  IBM  type 
Pentium  Computer,  a  6-dimensional  array  (for  the  a;'s  and  bj's)  with  a  common  subscript 

range  of  1,2,...,  15  (for  the  mi’s),  i.e.  more  than  107  variables.  As  anticipated,  LAH  and  OPT 
turn  out  to  be  good  approximations  to  BCK. 

One  striking  fact  has  been  observed  by  comparing  the  first  allocation  of  BCK  with  the 
allocation  m1,m2,m3  of  OPT.  In  all  but  one  of  the  96  parameter  settings  considered  in  the 
three  examples,  the  population  to  which  OPT  allocates  the  largest  sample  size  is  one  of  those 
which  BCK  would  allow  to  start  with.  This  indicates  clearly  that  the  allocation  scheme 
(R"nI,R'„, RzlRi) ,  i.e.  APP,  should  be  considered  as  a  good  approximation  to  BCK,  and 

thus  be  used  in  practice.  A  study  of  the  performance  of  APP  within  the  framework  of  the  three 
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examples  does  not  appear  to  be  feasible  at  this  time  because  of  the  length  of  the  computing 
time  required  for  such  a  task.  It  would  be  a  combination  of  calculating  the  individual  steps  R’u , 
randomizing  tied  populations,  and  simulating  the  m  outcomes. 
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